{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cart Pole Balancing with Random Policy\n",
    "\n",
    "Let's create an agent with the random policy, that is, we create the agent that selects the random action in the environment and tries to balance the pole. The agent receives +1 reward every time the pole stands straight up on the cart. We will generate over 100 episodes and we will see the return (sum of rewards) obtained over each episode. Let's learn this step by step.\n",
    "\n",
    "First, create our cart pole environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gym\n",
    "env = gym.make('CartPole-v0')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Set the number of episodes and number of time steps in the episode:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_episodes = 100\n",
    "num_timesteps = 50"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Episode: 0, Return: 23.0\n",
      "Episode: 10, Return: 12.0\n",
      "Episode: 20, Return: 23.0\n",
      "Episode: 30, Return: 15.0\n",
      "Episode: 40, Return: 19.0\n",
      "Episode: 50, Return: 10.0\n",
      "Episode: 60, Return: 16.0\n",
      "Episode: 70, Return: 10.0\n",
      "Episode: 80, Return: 22.0\n",
      "Episode: 90, Return: 38.0\n"
     ]
    }
   ],
   "source": [
    "#for each episode\n",
    "for i in range(num_episodes):\n",
    "    \n",
    "    #set the Return to 0\n",
    "    Return = 0\n",
    "    #initialize the state by resetting the environment\n",
    "    state = env.reset()\n",
    "    \n",
    "    #for each step in the episode\n",
    "    for t in range(num_timesteps):\n",
    "        #render the environment\n",
    "        env.render()\n",
    "        \n",
    "        #randomly select an action by sampling from the environment\n",
    "        random_action = env.action_space.sample()\n",
    "        \n",
    "        #perform the randomly selected action\n",
    "        next_state, reward, done, info = env.step(random_action)\n",
    "\n",
    "        #update the return\n",
    "        Return = Return + reward\n",
    "\n",
    "        #if the next state is a terminal state then end the episode\n",
    "        if done:\n",
    "            break\n",
    "    #for every 10 episodes, print the return (sum of rewards)\n",
    "    if i%10==0:\n",
    "        print('Episode: {}, Return: {}'.format(i, Return))\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Close the environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "env.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
